Back

Medical Decision Making

SAGE Publications

Preprints posted in the last 30 days, ranked by how well they match Medical Decision Making's content profile, based on 10 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Simulation-Based Comparison of ControlledInterrupted Time Series (CITS) and Multivariable Regression

ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.

2026-04-13 health policy 10.64898/2026.04.10.26350670 medRxiv
Top 0.1%
8.5%
Show abstract

When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.

2
From Protocol to Analysis Plan: Development and Validation of a Large Language Model Pipeline for Statistical Analysis Plan Generation using Artificial Intelligence (SAPAI)

Jafari, H.; Chu, P.; Lange, M.; Maher, F.; Glen, C.; Pearson, O. J.; Burges, C.; Martyn, M.; Cross, S.; Carter, B.; Emsley, R.; Forbes, G.

2026-03-19 health systems and quality improvement 10.64898/2026.03.19.26348626 medRxiv
Top 0.1%
6.2%
Show abstract

Background: Statistical Analysis Plans (SAPs) are essential for trial transparency and credibility but are resource-intensive to produce. While Large Language Models (LLMs) have shown promise in drafting protocols, their ability to generate high-quality, protocol-compliant SAPs remains untested against current content guidance. This study developed and validated an LLM-based pipeline for drafting SAPs from clinical trial protocols. Methods: We developed a structured, section-by-section prompting pipeline aligned with standard SAP guidance. We applied this pipeline to nine clinical trial protocols using three leading LLMs: OpenAI GPT-5, Anthropic Claude Sonnet 4, and Google Gemini 2.5 Pro. The resulting 27 SAPs were evaluated against a 46-item quality checklist derived from the published SAP guidelines. Items were double-scored by independent trial statisticians on a 0 to 3 scale for accuracy. We compared performance across LLMs and between item types (descriptive vs. statistical reasoning) using mixed-effects logistic regression. Results: Across 9 trials, the models produced SAP drafts with high overall accuracy (77% to 78%), with no difference in performance between the three LLMs (p=0.79) but varied by content type (p < 0.001). All models performed well on descriptive items (e.g., administrative details, trial design), with lower accuracy for items requiring statistical reasoning (e.g., modelling strategies, sensitivity analyses). Accuracy for statistical items ranged from 67% to 72%, whereas descriptive items achieved 81% to 83% accuracy. Qualitatively, models were prone to specific failure modes in complex sections, such as omitting necessary details for secondary outcome models or hallucinating sensitivity analyses. Discussion: Current LLMs can effectively draft portions of SAPs, offering the potential for substantial time savings in trial documentation. However, a human-in-the-loop approach remains mandatory; while models demonstrate strong capability in producing descriptive content, their independent application to complex statistical methodology design still requires further methodological development and training. Future work should explore advanced prompt engineering, such as retrieval-augmented generation or agentic workflows, to improve reasoning capabilities.

3
Economic value of resistance-guided gonorrhea treatment: cost-neutrality thresholds for resistance test pricing in the United States

Nichols, B. E.; Wonderly Trainor, B.; Hampson, G.; Grad, Y. H.; Klausner, J. D.

2026-04-07 health economics 10.64898/2026.04.07.26350302 medRxiv
Top 0.1%
5.0%
Show abstract

Background: Rising antimicrobial resistance in Neisseria gonorrhoeae threatens the effectiveness of existing therapies. Resistance-guided treatment (RGT) may reduce treatment failures, complications, and inappropriate use of last-line agents while slowing resistance emergence. Methods and Findings: We developed an individual-level stochastic simulation model of gonorrhea diagnosis and treatment in the United States, incorporating infection prevalence, symptom status, diagnostic accuracy, resistance profiles, treatment pathways, and partner management (costs in 2025 USD). We evaluated three resistance testing strategies, ciprofloxacin-only, ciprofloxacin+ceftriaxone, and triple-target (including a novel drug A), across a wide range of resistance scenarios. We quantified economic value across three dimensions: (1) per-episode direct medical cost savings, (2) system-level costs attributable to ceftriaxone resistance emergence among MSM, and (3) avoided costs of new antibiotic development, estimating the maximum per-test price at which RGT remains cost-neutral. Per-episode cost-neutrality thresholds ranged from near $0 when ceftriaxone resistance was absent to up to $45/test at 15% ceftriaxone resistance. At 50% ciprofloxacin and 5% ceftriaxone resistance, the population-weighted threshold was $4 (95% UI:$3-$8) for a CIP-only test and $11 (95% UI:$5-$14) for a triple-target test. Among MSM, incorporating system-level resistance emergence costs and avoided antibiotic development costs increased the total per-test value to $35-$145 for a single-target test and $84-$128 for a triple-target test, depending on whether prescribing practices shift when ceftriaxone resistance reaches 5%. Conclusions: Resistance-guided therapy offers economic benefits across multiple dimensions even at relatively high diagnostic prices, supporting investment in gonorrhea resistance testing to improve partner outcomes, delay resistance emergence, and enhance the long-term cost-efficiency of gonorrhea management.

4
HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments

Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.

2026-04-13 health systems and quality improvement 10.64898/2026.04.09.26350519 medRxiv
Top 0.1%
2.6%
Show abstract

As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.

5
Building a Resilient Antibiotic Market: India Sets the Pace An econometric modelling approach to estimate revenues in Indian private markets for a novel, broad-spectrum intravenous antibiotic

Maitreyi, L.; Rajagopal, S.; Anandkumar, A.; Datta, S.

2026-03-22 health economics 10.64898/2026.03.13.26348309 medRxiv
Top 0.1%
2.5%
Show abstract

India faces a mounting health crisis from antibiotic resistance, coupled with global pharmaceutical hesitancy to invest in novel antibiotic research and development (R&D), driven by complex scientific and financial hurdles. India carries one of the worlds largest absolute burdens of drug-resistant infections. The combination of a huge infectious-disease caseload, rapid urbanisation, and gaps in sanitation and primary care means that, when resistance emerges, it affects far more patients and generates a much larger pool of patients needing advanced antibiotics than in many high-income countries. Against this backdrop, demand for truly novel, broad-spectrum antibiotics in India is surging, fueled by rising multidrug-resistant infections, overstretched hospitals, and an antibiotic resistance market projected to grow rapidly over the next decade. Most countries respond with incentives and subscription models, for India, the answer lies in bold, innovative revenue strategies and in prioritising the domestic launch of novel antibiotics. This paper presents an econometric analysis of estimated valuation for a novel broad-spectrum antibiotic in India that, as a single therapeutic agent, can address several major hospital-acquired infections, including complicated urinary tract infections (cUTI), hospital-acquired pneumonia (HAP), and ventilator-associated pneumonia (VAP). The model focuses on a hypothetical "ideal" broad-spectrum intravenous antibiotic, and recommends that India pioneer market entry, highlighting financial models that maximise early revenues while still hardwiring stewardship. Launching new antibiotics first in India can catalyse robust real-world use, strengthen domestic pharma, and demonstrate that the economics of antibiotic innovation are viable. This decisive shift can transform India from a passive recipient of ageing drugs into the crucible where the next generation of life-saving antibiotics is forged, anchoring antibiotic research at the core of the countrys health security and economic resilience.

6
Uncertainty Aware Decision Support with Computationally Expensive Simulation Models: A Case Study of HIV Intervention Scenarios

fadikar, a.; Hotton, A.; de Lima, P. N.; Vardavas, R.; Collier, N.; Jia, K.; Rimer, S.; Khanna, A.; Schneider, J.; Ozik, J.

2026-04-17 hiv aids 10.64898/2026.04.15.26350970 medRxiv
Top 0.1%
1.9%
Show abstract

Detailed agent-based simulations are increasingly used to support policy decisions, but their computational cost and complex uncertainty structure make systematic scenario analysis challenging. We present a data-driven, uncertainty-aware decision support (DDUADS) workflow for using stochastic simulation models as decision-support tools under limited computational budgets. The approach combines several established techniques-sensitivity screening, Bayesian calibration using simulation-based inference, and multi-surrogate model integration for translational efficiency-into a coherent pipeline that enables uncertainty-aware policy analysis. Rather than producing a single baseline, the calibration stage yields a posterior distribution over plausible model parameterizations, allowing flexible, uncertainty-aware forward projections. We demonstrate the DDUADS workflow on the INFORM-HIV agent-based model of HIV transmission in Chicago to evaluate potential disruptions in antiretroviral therapy (ART) and pre-exposure prophylaxis (PrEP) use. While the specific application is HIV modeling, the challenges and techniques described here arise in other simulation studies and can be applied to decision support in other domains.

7
Educational Browser-Native SIR Simulation: Analytical Benchmarks Showing Numerical Accuracy for Lightweight Epidemic Modeling

Ben-Joseph, J.

2026-04-17 epidemiology 10.64898/2026.04.15.26350961 medRxiv
Top 0.1%
1.7%
Show abstract

Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge--Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Benchmark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.

8
Ad-verse Effects: Pharmaceutical Advertising Shifts Drug Recommendations by Consumer-Facing AI

Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.

2026-04-16 health policy 10.64898/2026.04.14.26350868 medRxiv
Top 0.1%
1.7%
Show abstract

Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.

9
PRAM: Post-hoc Retrieval Augmentation for Parameter-Free Domain Adaptation of ICU Clinical Prediction Models

Jeong, I.; Lee, T.; Kim, B.; Park, J.-H.; Kim, Y.; Lee, H.

2026-04-05 health systems and quality improvement 10.64898/2026.04.03.26350132 medRxiv
Top 0.1%
1.5%
Show abstract

Background Clinical prediction models degrade when deployed across hospitals, yet retraining requires technical expertise, labeled data, and regulatory re-approval. We investigated whether post-hoc retrieval augmentation of a frozen model's output, analogous to retrieval-augmented methods in natural language processing, can mitigate this degradation without any parameter modification. Methods We developed the Post-hoc Retrieval Augmentation Module (PRAM), which combines predictions from a frozen base model with outcome information retrieved from similar patients in a local patient bank. Five base models (logistic regression through CatBoost) and three retrieval strategies were evaluated on 116,010 ICU patients across three databases (MIMIC-IV, MIMIC-III, eICU-CRD) for acute kidney injury (AKI) and mortality prediction. A bank size deployment simulation modeled performance from zero to full local data accumulation, complemented by source bank cold start, stress tests, and calibration experiments. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Results Retrieval benefit was inversely associated with base model complexity ({rho} = -0.90 for AKI, -1.00 for mortality): simpler models benefited more, consistent with retrieval capturing residual signal unexploited by the base model. PRAM showed a statistically significant monotone dose-response between bank size and prediction performance across all six outcome-target combinations (Kendall {tau} trend test, q = 0.031 for all). At the pre-specified primary comparison (bank = 5,000), the improvement was confirmed for the two largest-shift settings (eICU-CRD AKI: {Delta}AUROC = +0.012, q < 0.001; eICU-CRD mortality: {Delta}AUROC = +0.026, q < 0.001). Pre-loading a source bank bridged the cold-start gap, providing an immediate performance gain equivalent to approximately 2,000 to 5,000 local patients. Conclusions PRAM provides a parameter-free adaptation mechanism that requires no model retraining, gradient computation, or regulatory re-evaluation at the deployment site. Effect sizes were modest and did not reach cross-model superiority, but the consistent dose-response pattern and the absence of retraining requirements establish retrieval-based adaptation as a viable approach for clinical model transportability. The retrieval mechanism additionally opens a pathway toward case-based interpretability, where predictions are accompanied by identifiable similar patients from the deploying institution.

10
Causal estimands and target trials for the effect of lag time to treatment of cancer patients

Goncalves, B. P.; Franco, E. L.

2026-04-08 epidemiology 10.64898/2026.04.07.26350338 medRxiv
Top 0.2%
1.3%
Show abstract

Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.

11
Causal analyses using education-health linked data for England: a case study

De Stavola, B. L. L.; Aparicio Castro, a.; Nguyen, V. G.; Lewis, K. M.; Dearden, L.; Harron, K.; Zylbersztejn, A.; Shumway, J.; Gilbert, R.

2026-03-19 health policy 10.64898/2026.03.13.26348340 medRxiv
Top 0.2%
1.1%
Show abstract

IntroductionThis article summarises lessons learnt from the Health Outcomes for young People throughout Education (HOPE) Study and serves as a real world, transferable application for addressing causal questions using administrative data. The HOPE study applied causal methods to analyses of administrative data in Education and Child Health Insights from Linked Data (ECHILD) aimed at studying the effectiveness of provision for special educational needs and disability (SEND) on health and education outcomes. MethodsDefining causal questions regarding the impact of SEND provision required judicious mapping of the question onto the data, leading to the selection of appropriate measures of effect, transparent handling of the data and control of confounding factors to estimate effects. We adopted the target trial emulation framework to guide these steps. Having encountered specific computational challenges in estimating the effects of interest, we simulated data that resembled the HOPE study and used them to practice the implementation of alternative estimation methods and to study impact of some of their assumptions. ResultsThe creation and analysis of the simulated data provided valuable insights. First, we learned the importance of aligning the target of estimation with the causal question at hand. Second, we observed how deviations from assumptions specific to each estimation method can affect results. Third, we highlighted the benefits of employing alternative estimation methods as sensitivity tools that can aid the interpretation of the resulting estimates. Finally, we offer user-friendly code in two programming languages (R and Stata) and accompanying simulated data to facilitate the implementation of these methods for similar causal questions. ConclusionWe recommend users of administrative data to fully specify -and possibly revise- the causal questions they wish to address and to carefully examine and compare assumptions, implementation and results obtained using alternative estimation methods.

12
An Empirical Assessment of Inferential Reproducibility of Linear Regression in Health and Biomedical Research Papers

Jones, L.; Barnett, A.; Hartel, G.; Vagenas, D.

2026-04-07 health systems and quality improvement 10.64898/2026.04.07.26350296 medRxiv
Top 0.2%
0.9%
Show abstract

Background: In health research, variability in modelling decisions can lead to different conclusions even when the same data are analysed, a challenge known as inferential reproducibility. In linear regression analyses, incorrect handling of key assumptions, such as normality of the residuals and linearity, can undermine reproducibility. This study examines how violations of these assumptions influence inferential conclusions when the same data are reanalysed. Methods: We randomly sampled 95 health-related PLOS ONE papers from 2019 that reported linear regression in their methods. Data were available for 43 papers, and 20 were assessed for computational reproducibility, with three models per paper evaluated. The 14 papers that included a model at least partially computationally reproduced were then examined for inferential reproducibility. To assess the impact of assumption violations, differences in coefficients, 95% confidence intervals, and model fit were compared. Results: Of the fourteen papers assessed, only three were inferentially reproducible. The most frequently violated assumptions were normality and independence, each occurring in eight papers. Violations of independence were particularly consequential and were commonly associated with inferential failure. Although reproduced analyses often retained the same binary statistical significance classification as the original studies, confidence intervals were frequently wider, indicating greater uncertainty and reduced precision. Such uncertainty may affect the interpretation of results and, in turn, influence treatment decisions and clinical practice. Conclusion: Our findings demonstrate that substantial violations of key modelling assumptions often went undetected by authors and peer reviewers and, in many cases, were associated with inferential reproducibility failure. This highlights the need for stronger statistical education and greater transparency in modelling decisions. Rather than applying rigid or misinformed rules, such as incorrectly testing the normality of the outcome variable, researchers should adopt modelling frameworks guided by the research question and the study design. When assumptions are violated, appropriate alternatives, such as robust methods, bootstrapping, generalized linear models, or mixed-effects models, should be considered. Given that assumption violations were common even in relatively simple regression models, early and sustained collaboration with statisticians is critical for supporting robust, defensible, and clinically meaningful conclusions.

13
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 0.2%
0.9%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

14
Designing national programs for expanded carrier screening: Results from a discrete-choice experiment in Singapore

Blythe, R.; Senanayake, S.; Bylstra, Y.; Roberts, J.; Choi, C.; Yeo, M. J.; Goh, J.; Graves, N.; Koh, A. L.; Jamuar, S. S.

2026-04-13 health economics 10.64898/2026.04.09.26350563 medRxiv
Top 0.2%
0.9%
Show abstract

BackgroundCarrier screening for inherited genetic disorders can reduce the burden of conditions that lead to childhood morbidity and mortality, including thalassaemia, cystic fibrosis, and spinal muscular atrophy. To be successful, national carrier screening programs should aim to maximise uptake, which may depend on population preferences for screening characteristics. In this study, we aimed to determine how expanded carrier screening in Singapore should be designed based on operational factors including suggested copayments, wait times, and disorders included in screening panels. MethodsWe elicited stated preferences for the design of a hypothetical national carrier screening program with seven attributes from 500 Singaporeans of reproductive age (18 to 54). A discrete choice experiment was applied using 30 choice tasks with 3 alternatives per task, divided between 3 blocks. The mixed multinomial logit model was used to estimate willingness-to-pay for each attribute level. Predicted uptake for three plausible screening programs was assessed, with copayment amounts from $0 to $1,200 in increments of $30. Impact on the annual national budget was calculated as a function of 25,000 expected eligible couples per year. All costs were reported in 2026 SGD. ResultsRespondents showed the strongest preferences for cost, followed by the number of diseases included in the panel, then wait times, with limited impact of remaining attributes. With no copayments, predicted uptake ranged from 85% [95% CI: 83% to 87%] to 90% [88% to 92%] for the basic and utility-maximising screening programs, respectively. This declined to 61% [56% to 66%] and 69% [65% to 73%] and, respectively, at a copayment of $1,200 per test. The model predicted higher uptake if a selection of screening alternatives were available, compared to a single program. The budget impact was highly dependent on population eligibility, copayments, and couples decision-making processes, but was unlikely to exceed $22.5m [$19.0m to $26.6m] per year unless expanded beyond married couples. ConclusionsThere was high predicted demand for carrier screening even as copayments increased. Successful strategies to improve uptake may include reducing copays and wait times, increasing the number of screening options available to prospective parents, and increasing program eligibility beyond pre-conception married couples.

15
AI Implementation in Safety Net Healthcare: Understanding Barriers and Strategies

Thomas, C.; Kim, J. Y.; Hasan, A.; Kpodzro, S.; Cortes, J.; Day, B.; Jensen, S.; LHuillier, S.; Oden, M. O.; Zumbado Segura, S.; Maurer, E. W.; Tucker, S.; Robinson, S.; Garcia, B.; Muramalla, E.; Lu, S.; Chawla, N.; Patel, M.; Balu, S.; Sendak, M.

2026-04-11 health systems and quality improvement 10.64898/2026.04.07.26350351 medRxiv
Top 0.2%
0.9%
Show abstract

Safety net healthcare delivery organizations (SNOs) serve vulnerable populations but face persistent challenges in adopting new technologies, including AI. While systematic barriers to technology adoption in SNOs are well documented, little is known about how AI is implemented in these settings. This study explored real-world AI adoption in SNOs, focusing on identifying barriers encountered across the AI lifecycle and strategies used to overcome them. Five SNOs in the U.S. participated in a 12-month technical assistance program, the Practice Network, to implement AI tools of their choosing. Observed barriers and mitigation strategies were documented throughout program activities and, at the conclusion of the program, reviewed and refined with participants using a participatory research approach to ensure findings reflected lived experiences and organizational contexts. Key barriers emerged during the Integration and Lifecycle Management phases and included gaps in AI performance evaluation and impact assessments, communication with patients about AI use, foundational AI education, financial resources for purchasing and maintaining AI tools, and AI governance structures. Effective strategies for addressing these barriers were primarily supported through centralized expertise, structured guidance, and peer learning. These findings provide granular, actionable insights for SNO leaders, offering guidance for anticipating barriers and proactively planning mitigation strategies. By including SNO perspectives, the study also contributes to the broader health AI ecosystem and underscores the importance of participatory, collaborative approaches to support safe, effective, and ethical AI adoption in resource-constrained settings. Author SummarySafety net organizations (SNOs) are healthcare systems that primarily serve low-income and underinsured patients. While interest in artificial intelligence (AI) in healthcare has grown rapidly, little is known about how these organizations experience AI adoption in practice. In this study, we partnered with five SNOs over a 12-month program to document the challenges they encountered when implementing AI tools and the strategies they used to address them. We worked closely with SNO staff throughout the process to ensure our findings reflected their lived experiences with AI implementation. We found that the most common challenges arose when organizations tried to integrate AI into daily operations and monitor and maintain those tools over time. Specific barriers included difficulty evaluating whether AI was performing as expected, limited guidance on communicating with patients about AI use, a lack of resources for staff training, limited financial resources, and the absence of formal governance structures. Successful strategies for overcoming these challenges drew on shared knowledge and structured support provided by the program, as well as learning from peer organizations. These findings offer practical guidance for SNO leaders planning or managing AI adoption, and contribute to a broader conversation about what is required to implement AI safely and effectively in healthcare settings that serve the most medically and socially vulnerable patients.

16
Determining context-specific economically feasible age ranges for female HPV catch-up vaccination in LMICs: a model-based health economic assessment

Wondimu, A.; Georges, D.; Macacu, A.; Wittenauer, R.; Fuady, A.; Gini, A.; Baussano, I.; Man, I.

2026-03-27 health economics 10.64898/2026.03.26.26348394 medRxiv
Top 0.2%
0.9%
Show abstract

Background Catch-up vaccination will be pivotal for achieving WHOs cervical cancer elimination goals in low- and middle-income countries (LMICs). We assessed the health-economic impact of catch-up HPV vaccination for females in LMICs. Methods Using IARCs METHIS modelling platform and data from 132 LMICs, we simulated HPV catch-up vaccination beyond the primary target age, varying the maximum age up to 30 years. Budget impact was expressed as a share of national five-year immunization budgets and current health expenditure. We conducted cost-effectiveness analyses for a smaller subset of countries for which high-quality cervical cancer treatment costs were available. Findings Catch-up HPV vaccination up to age 30 in LMICs could prevent 9.2 million cervical cancer cases over the lifetime among females aged 9-30 years. Across countries, budget impact ranged from 0.007%-2.24% of five-year health expenditure and 0.002%-236.65% of immunization budgets, with vaccine procurement comprising about 70% of costs. Gavi support could reduce costs by nearly 70% for catch-up up to age 18. Catch-up vaccination up to age 30 was cost-effective in almost all evaluated countries, except in one where cost-effectiveness was achieved up to age 21. Interpretation In LMICs, after achieving adequate coverage in the primary target group (9-14 years), expanding HPV catch-up vaccination would be impactful and cost-effective. Sustainable financing, Gavi support, and cost-minimization strategies are crucial for successful catch-up programmes and progress toward cervical cancer elimination.

17
Bridging the Coverage Gap: State Medicaid Limitations for Cardiac Rehabilitation Programs and the Risk to Disadvantaged Communities

Henson, J. C.; Spears, G. L.; Daughdrill, B. K.; Hagood, J. N.; Vallurupalli, S.

2026-04-05 health policy 10.64898/2026.04.03.26350136 medRxiv
Top 0.3%
0.8%
Show abstract

Background: Cardiac rehabilitation (CR) is a cost-effective, evidence-based intervention that improves outcomes for patients with heart failure (HF), yet access remains inequitable, particularly among Medicaid enrollees. This study evaluates the state-by-state variability in Medicaid coverage for CR services and examines the implications for health equity in vulnerable populations. Methods: We conducted a cross-sectional policy analysis of all 50 U.S. states to assess Medicaid coverage for outpatient CR services billed under CPT codes 93797 (without ECG monitoring) and 93798 (with ECG monitoring). Publicly available Medicaid documents were reviewed and supplemented with direct communication with state Medicaid agencies. States were categorized into full, partial/inconclusive, or no coverage. Geographic trends were visualized through heat maps and contextualized using state-level Medicaid enrollment data. Results: Marked disparities in CR coverage were identified. Only 41 states reimbursed for CPT 93797, and 43 for CPT 93798. Eight states lacked coverage for either code, predominantly in the South and Mountain West, including Arkansas, Georgia, Louisiana, Mississippi, Nevada, and Utah. States with the highest Medicaid enrollment (e.g., Louisiana, Arkansas) often provided no CR coverage, compounding access barriers for high-risk, low-income populations. Conclusions: The absence of standardized Medicaid coverage for CR contributes to systemic inequities in cardiovascular care, disproportionately impacting disadvantaged communities. Aligning Medicaid policies to ensure universal CR access--particularly through tele-rehabilitation and value-based care models--could reduce hospitalizations, improve survival, and promote health equity across the U.S.

18
What Does It Take to Map a Country? Scaling OpenStreetMap Mapping for Accurate Health Accessibility Modelling in Madagascar

Ihantamalala, F.; Ravaoarimanga, M.; Randriahamihaja, M.; Revillion, C.; Longour, L.; Randrianjatovo, T.; Rafenoarimalala, F. H.; Bonds, M. H.; Finnegan, K. E.; Herbreteau, V.; Rakotomanana, F.; Garchitorena, A.

2026-03-27 health policy 10.64898/2026.03.25.26349339 medRxiv
Top 0.3%
0.8%
Show abstract

Comprehensive geographic data are essential to accurately model geographic accessibility to healthcare and to guide equitable health system planning and implementation. In low-income countries, however, incomplete road and building data in global databases such as OpenStreetMap (OSM) limit the precision and operational applications of geographic accessibility models. Following a successful pilot in one district of Madagascar, we evaluated the scalability of an exhaustive mapping approach to produce highly granulated household-level accessibility estimates at regional and national levels. Using satellite imagery and the OSM platform, we mapped all buildings, roads, footpaths, and rice fields across seven additional districts in southeastern Madagascar. We estimated travel routes, distance and travel time between each household and the nearest primary health center (PHC) or community health site (CHS) using the OSM Routing Machine, combined with predictions of travel speed from a locally calibrated statistical model. We then assessed population density and mapping completeness for roads and buildings in our study area and across Madagascar using AI-generated reference datasets (Microsoft and Facebook/MapWithAI) and estimated corresponding mapping times. Finally, we estimated the resources required in person-years to scale this approach across Madagascar using two different extrapolation methods. Nearly one and a half million buildings and 197,000 km of footpaths were added to OSM across the eight mapped districts, for a total area of about 30,200 km2. Between 24% and 65% of the population lived within one hour of a PHC depending on the district, and 87%-99% lived within one hour of a CHS. Most Malagasy districts were classified as having low completeness for both buildings and roads. Scaling up the approach to cover the entire country would require between 220 and 350 person-years depending on the extrapolation method and assumptions used. Mapping an entire country with sufficient detail to precisely model healthcare accessibility for every household is feasible but resource-intensive. Combining human mapping, participatory approaches, and AI-assisted datasets can substantially improve OSM completeness and generate actionable, high-resolution travel-time data for health planning. Our findings provide a roadmap for Madagascar and other countries seeking to develop national-scale geospatial infrastructure for sustainable development and universal health coverage.

19
Governing Decisions of Probability Cutoffs in Clinical AI Deployment: A Case Study of Asthma Exacerbation Prediction

Zheng, L.; Agnikula Kshatriya, B. S.; Ohde, J.; Rost, L.; Malik, M.; Peterson, K.; Brereton, T.; Loufek, B.; Pereira, T.; Gai, C.; Park, M.; Hartz, M.; Fladager-Muth, J.; Wi, C.-I.; Tao, C. J.; Garovic, V.; Juhn, Y. J.; Overgaard, S. M.

2026-03-22 health informatics 10.64898/2026.03.18.26348562 medRxiv
Top 0.3%
0.8%
Show abstract

Models that estimate the probability of an adverse clinical outcome require an operational cutoff to translate continuous estimated probabilities into discrete labels that can trigger clinical action. Although statistical methods identify optimal cut-offs, threshold selection ultimately reflects value judgments regarding harm tolerance, resource allocation, and workflow feasibility. We describe a governance-informed approach to selecting a deployment threshold for an asthma exacerbation (AE) prediction model integrated into clinical workflows. Using prevalence-adjusted performance metrics and real-world provider capacity modeling, we evaluated multiple candidate thresholds and quantified downstream workload and missed-event trade-offs. We demonstrate that statistically optimal thresholds may produce operationally infeasible alert volumes or unacceptable miss rates. We propose a structured threshold governance framework integrating statistical performance, clinical utility, stakeholder input, and human oversight safeguards. This case illustrates how threshold decisions should be treated as organizational governance processes rather than purely technical optimizations.

20
Performance optimization of an R Shiny-based digital health dashboard for monitoring small and sick newborn care in low-resource hospital settings

Thomas, J.; Jenkins, G.; Chen, J.; Ogero, M.; Malla, L.; Hirschhorn, L. R.; Richards-Kortum, R.; Oden, Z. M.; Bohne, C.; Wainaina, J.

2026-03-19 health systems and quality improvement 10.64898/2026.03.08.26347893 medRxiv
Top 0.3%
0.7%
Show abstract

BackgroundDigital health dashboards can enhance health system performance by transforming routinely collected data into actionable insights for decision-making. In low-resource settings, however, their effectiveness depends not only on the relevance of indicators but also on system reliability within constrained digital infrastructure. Neonatal mortality remains a major global health challenge, with the highest burden in low- and middle-income countries, where many deaths are preventable through timely, evidence-based interventions. Continuous monitoring of care processes and outcomes is therefore essential. To support this need, we developed the NEST360 Implementation Tracker (NEST-IT) using R Shiny to support quality improvement across more than 100 hospitals in sub-Saharan Africa. As the platform scaled to over half a million records and increasing concurrent users, performance constraints emerged, particularly in hospitals with limited computing resources, threatening timely access to critical information. ObjectiveThis study aimed to describe optimization strategies applied to the NEST-IT dashboard and evaluate their impact before and after implementation. MethodsA structured optimization process was implemented following established R Shiny performance principles. Dashboard profiling was first conducted to identify key bottlenecks, after which targeted improvements were applied to improve efficiency and responsiveness. A quasi-experimental pre-post evaluation (December 2023-August 2024) assessed performance using three indicators: server processing time, visualization rendering time (VRT), and Time to First Byte (TTFB). Metrics were measured repeatedly during one-month baseline and post-optimization periods and summarized using mean values. ResultsFour primary bottlenecks were identified: delayed server responses, slow visualization rendering, inefficient data handling, and inconsistent device performance. Following optimization, interactive plot load time decreased from 10.1 to 2.7 {+/-} 0.6 seconds (73.3% improvement). Visualization rendering improved from 3.61 to 1.62 seconds, while server processing time fell from 2.3 {+/-} 0.7 to 0.8 {+/-} 0.3 seconds. TTFB improved from 1.9 {+/-} 0.4 to 0.6 {+/-} 0.2 seconds, and system uptime increased from 92.5% to 99.2%. ConclusionPerformance optimization substantially improved dashboard responsiveness, enabling timely access to critical neonatal information in resource-constrained hospital settings. The findings provide practical, evidence-based framework for improving the performance of R Shiny dashboards and demonstrate scalable strategies for delivering reliable digital decision-support tools in low-resource health systems.